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Abstract 

We seek to develop network algorithms for function computation in sensor networks. Specifically, we want dynamic joint 
aggregation, routing, and scheduling algorithms that have analytically provable performance benefits due to in-network computation 
as compared to simple data forwarding. To this end, we define a class of functions, the Fully-Multiplexible functions, which includes 
several functions such as parity, MAX, and A;"' -order statistics. For such functions we characterize the maximum achievable refresh 
rate of the network in terms of an underlying graph primitive, the min-mincut. In acyclic wireline networks, we show that the 
maximum refresh rate is achievable by a simple algorithm that is dynamic, distributed, and only dependent on local information. 
In the case of wireless networks, we provide a MaxWeight-like algorithm with dynamic flow splitting, which is shown to be 
throughput-optimal . 

Index Terms 

n-network function computation wireless sensor networks dynamic routing and scheduling algorithmsn-network function 
computation wireless sensor networks dynamic routing and scheduling algorithmsi 

I. Introduction 

In-network function computation is one of the fundamental paradigms that increases the efficiency of sensor networks vfs-a-vfs 
conventional data networks. Sensor nodes, in addition to sensing and communication capabilities, are often equipped with basic 
computational capabilities. Depending on the task for which they are deployed, a sensor network can be viewed as a distributed 
platform for collecting and computing a specific function of the sensor data. For example, a sensor network for environment 
monitoring may only be concerned with keeping track of the average temperature and humidity in a region. Similarly 'alarm' 
networks, such as those for detecting forest fires, require only the maximum temperature. The baseline approach for performing 
such tasks is to aggregate all the data at a central node and then perform offline computations; the premise of in-network 
computation is that distributed computation schemes can provide sizable improvement in the performance of the network. 
However, from the perspective of designing network algorithms, in-network function computation poses a greater challenge 
than data networks as the freedom to combine and compress packets, as long as the desired information is preserved, destroys 
the flow conservation laws central to data networks. The network has a lot more flexibility, so much so as to make quantifying 
its performance much more challenging 

Our focus in this paper is to develop a queue-based framework for such systems, and use it to design and analyze network 
algorithms. By network algorithms, we refer to cross layer algorithms that jointly perform the following tasks: 

1) Aggregating the data at nodes via in-network computation, 

2) Routing packets between nodes, and 

3) Scheduling links between nodes for packet transmission. 

Cross-layer algorithms for data networks, although very successful in both theory and increasingly in real-system imple- 
mentation, are concerned only with the scheduling and routing aspects. Hence, there is a need for a new framework and new 
algorithms for in-network function computation in sensor networks. Keeping in mind the lessons learnt from the success of data 
networks, our aim is to design network algorithms that are dynamic (i.e., the algorithm should not be designed assuming static 
network parameters, but rather, use the network state to adaptively learn the network parameters), robust (i.e., the algorithm 
adapts to temporal changes in traffic and network topology), capable of dealing with a large class of functions (i.e., if the 
function being computed by the network changes, then one should only need to make minor changes to the scheduling and 
routing algorithms), and generalizable to all network topologies. 

Due to the wide range of potential applications, there are many existing models for such networks, and many different 
perspectives from which they are analyzed. Some representative works in this regard are as follows: 
• The pioneering work of Giridhar and Kumar ||T] considers the function computation problem from the point of view of 
the capacity scaling framework of Gupta and Kumar f2\. In particular, they quantify scaling bounds for certain classes of 
functions (symmetric, divisible, type-sensitive, and type-threshold) under the protocol model of wireless communications 
and for collocated graphs and random geometric graphs. 
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Other papers consider the function computation problem from the point of view of information theory Q and 
communication complexity |5|, |7|, characterizing various metrics (refresh rate, number of messages, etc.) for different 
functions in terms of certain properties of the graph, the function to be computed, and the underlying data correlation. All 
the above works take an idealized 'bottom-up' approach to determine the fundamental limits of the problem, and hence 
are not directly suitable for designing practical network algorithms. 

Similar in spirit to the above papers, another approach is to study function computation from the perspective of source 
coding ||6), Q. These works characterize bounds and show the existence of coding schemes for noiseless, wireline networks. 
As with the previous algorithms, these policies tend to be idealized, using more complex coding-based schemes instead of 
simple routing and aggregation (we will later show that such simple strategies suffice for optimal in-network computation 
of a number of functions of interest); further, these papers do not have explicitly defined policies but rather existence 
results for such policies. 

In contrast to the 'bottom-up' approach of all the above works, Krishnamachari et al. |[8| adopt a more 'top-down' 
approach whereby they formulate network models that abstract out some of the complexity while allowing quantification 
of performance gains (in their case, energy and delay). Their models do not, however, achieve the optimal throughput and 
also do not allow for the design of dynamic network algorithms. 

An alternate model of sensor networks is to assume that nodes are capable of in-network compression, wherein only the 
compression (and not merging) of flows is permitted. For example. Back et al. flOl consider routing algorithms for power 
savings in hierarchical sensor networks. Similarly, Sharma et al. design energy-efficient queue-based algorithms under 
the assumption that the only operation allowed in addition to routing and scheduling is compression of packets at the 



source node 

The queue^ased model for data networks has proved to be an essential tool in designing provably-efficient algorithms for 
such systems. This model has provided a common framework for understanding various aspects of data network performance 
such as throughput |T 2|, p3) , delay p4) , p5 1, flow utility maximization 1 16|-| 18 1, network utility maximization |19|, |20|, 
distributed algorithms |21| , among others (for an overview, refer to p2)). In addition, these algorithms have been implemented 
in real systems |23|, |24|, including in sensor networks \25\, with good results. However, these algorithms are designed for 



data networks, and can not exploit any potential benefit from in-network computation. More recently, this framework has been 
extended to fork-and-join networks with fixed routing p6) , and resource allocation in processing networks |j27j. 

Using fixed routing in a network usually leads to suboptimal operations as the routes may not be designed to optimize the 
network performance; in general, even choosing the single best fixed route can perform arbitrarily worse than with dynamic 



routing (see example in Section IIIi. Further, static routing is not robust to temporal changes in the network. However, 
introducing dynamic routing with in-network computation destroys the flow conservation equations that exist in data networks 
and networks with fixed flows, as the flow out of a node depends both on inflow as well as (dynamic) packet aggregation at that 
node. Thus, there is a need to come up with a new queue-based framework and algorithms for efficient function computation 
in sensor networks, and our paper is a step in this direction. 

A. Main Contributions 

Our main contributions in this paper are as follows: 

• We identify a class of functions, the Fully-Multiplexible or FMux functions, for which we provide a tight characterization 
of the maximum refresh rate with in-network computation, i.e., the maximum rate at which the sensors can generate data 
such that the computation can be performed in a stabl^ manner. More formally, we show that for these functions, if the 



refresh rate exceeds a certain graph parameter (the stochastic min-min-cut, which we define formally in Section III i, then 
the system is transient under any algorithm, whereas for any rate lower than this parameter, we construct a policy that can 
stabilize the system. 

» Leveraging the results of Massoulie et al. flE] on broadcasting, we obtain a wireline routing algorithm for aggregation 
via in-network computation of FMux functions in directed acyclic graphs. Our approach is based on the observation 
that broadcasting and aggregation are in some sense, duals, of each other. More technically, the duality occurs between 
'isolation' of packets in aggregation (i.e., a packet does not have neighboring packets to aggregate with) and that of 
multiple receptions of the same packet (from different neighbors) in broadcasting. By suitably modifying the approach 
in p8) , we are able to develop an in-network aggregation algorithm for which routing is completely decentralized, and 
simple random packet forwarding and aggregation suffices for throughput-optimality. 

> For general wireless networks we develop dynamic algorithms based on a centralized allocation of routes (dynamic 
flow splitting) and MaxWeight-type scheduling. In particular, we show that loading rounds on trees in a greedy manner 
(whereby an incoming round is loaded on the least weighted aggregation tree), coupled with an appropriate scheduling 
rule, is throughput-optimal for computing FMux functions. The analysis of this algorithm is unique in that in addition to 
an appropriate Lyapunov function, it requires the construction of appropriate tree-packings of the network graph in order 
to show the throughput-optimality of this routing scheme. 

'By stability, we refer to the standard notion of the existence of a stationary regime for the queueing process fl3| , |22[ . 
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Notation: Throughout the paper, we use calligraphic fonts (Q,A, etc.) to denote sets and the corresponding capital letter 
(Q, A, etc.) to denote their cardinality. We interchangeably use U or + for adding elements to sets, and — for deleting elements 
from sets, and sometimes for brevity of exposition, use the element u to denote the singleton set {u} when the meaning 
is clear from the context (in particular, for a set S and element u, S + u ^ S U {u}). We also use the shorthand notation 
[iV]^{l,2,...,7V}. 

II. System Model and Function Classes 

In this section we describe the system model we study in the rest of the paper. At a high level, the system consists of a 
network of N nodes, one of which is the data aggregator and the rest are sensors. Sensor nodes are capable of three tasks: 
sensing the environment, transmitting to and receiving data from other nodes, and performing computations on the data. The 
sensors are assumed to sense the environment in a synchronous manner, and the overall purpose of the system is to compute a 
specific function of the synchronously generated sensor data and forward it to the aggregator. Further, the function computation 
is assumed to be done in a repeated manner, and the metric used to quantify the efficacy of an algorithm is the maximum 
synchronous rate at which the sensors can generate data such that the required function of the data can be forwarded to the 
aggregator in a stable manner This rate is henceforth referred to as the maximum refresh rate of the network. 

Before we describe the queueing framework for function computation, we first outline the general communication model 
that we consider in this work. This model is the same as that considered for studying data networks |13|. In the next section, 
we will outline the modifications we make in order to capture the in-network computation aspect of a sensor network. 
Communication Graph: We model the topology of the sensor network as a directed graph G{Af,C), where A/" is a set of N 
nodes, and £ is a set of L directed links which determine the connectivity between nodes. There is a special node, a € Af, 
referred to as the aggregator, and the rest of the nodes in JV are sensor nodes. Directed link {u, v) E £ represents that there 
exists a communication channel from node u to node v (in wireline this corresponds to a physical channel, while in wireless 
it represents the fact that the nodes are in radio range). 



Transmission Model: Following the convention in literature |13|, |28|, we consider a continuous time system in case of 
wireline systems, whereas in the case of wireless networks, we assume that time is slotted , and state all rates in bits per slot. 
In wireline networks, we define a vector of link rates c = {cut)}(„ v)gC' '^^^ assumed to traverse a link {u,v) G C with 
a random transit time with distribution Exponential{cuy). The transit times are independent across links and across packets 
crossing the same link. 

For wireless networks, we make the following assumptions/definitions |19|: 

• We assume that the channels between nodes are constant (but can extend to time varying channels with added notation, 
see |[T?|). The wireless nature of the network is reflected in the interference constraints. 

• For transmission schedule / G 2^, c(/) = {c„„(/)} denotes the link-rate vector of transmission rates over the links under 
the chosen schedule. 

• I C 2^ is the set of valid schedules that obey the interference constraints (henceforth referred to as independent sets). c(/) 
is said to be admissible if the link-rates can be achieved simultaneously in a time slot. T = {c{I) : / e 1} is the set of all 
admissible c(/) and is assumed to be time invariant as stated above. Further, we assume that c„i,(/) < Cmax V {u,v) e 

• c is said to be obtainable if c e CHiV), the convex hull of T. An obtainable link-rate vector can be achieved by time 
sharing over admissible link-rate vectors. 

• From the definition of the convex hull, we have that for every obtainable rate vector c e CHiT), there exists a probability 
measure tt e ' over I such that c < ''^{I)c-uv{I), {u, v) S £}. The vectors tt are called Static Service Split (or 
SSS) rules |13|, and represent time sharing fractions between different independent sets in order to achieve the rate c. 

Up to this point, the system is identical to one considered for data networks. To highlight the unique features of a physical 
sensor network performing function computation (and how they affect the modeling of such a system), we consider the following 
example. In the process, we also indicate the gains achievable via in-network computation versus data download and processing 
at the aggregator. 



Example 1: Consider a grid of N temperature sensors, with a single aggregator at the center, engaged in recording the 
maximum temperature over these sensor readings. Each node is connected to its four immediate neighbors in the grid via 
links with a fixed capacity c. Every node senses the temperature synchronously, and the aggregator desires the MAX of these 
synchronous measurements. Suppose the network operates by transferring all the data to the aggregator, and then calculating 
the MAX offline; the maximum rate at which the measurements can be made is then as all the packets must pass 

through one of the 4 links entering the aggregator. On the other hand, if we allow in-network computation, wherein nodes on 
receiving multiple packets can discard all but the one with highest value, then the network can operate at a rate of 0(1), as the 
bottleneck is now the minimum-cut of the graph (again the 4 links entering the aggregator). In subsequent sections, we show 
that for certain functions like MAX, and any network, the maximum possible refresh rate can be related thus to minimum-cuts 
in the network. Further, there are dynamic algorithms that support rates up to the maximum refresh rate. 
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Keeping this example in mind, we now outline the rest of our system model. 

Traffic Model: We consider a symmetric arrival rate model, where each sensor node senses the environment synchronously 
at a rate A (the refresh rate of the network). The aim of a network algorithm is to support the maximum possible A while 
ensuring that the network is stable. 

In case of wireline networks, packets are generated synchronously at all nodes i E J\f following a Poisson process with rate 
A. In case of wireless networks, the arrival process Ai[t] in time slot t consists of a random number of packets per time slot 
generated in a synchronous manner , i.e., Ai[t] = Aj[t] = A[t]yi,j E JV, and further A[t] is i.i.d across time. In this case, 
we define the refresh rate as 

A = E[A[t]], 

and also assume that A[t] has finite second momenj^ which we denote as rriji = 'E[A[t]'^]. 

We associate all simultaneously generated packets with a unique identifier called the round number, which represents the time 
when the packet was generated. In particular, we follow a scheme whereby we number the packets sequentially in ascending 
order of their generation times, and updating the round numbers when packets complete being aggregated (Thus the oldest 
unaggregated packet in the network always has round number and so forth). This scheme of round number allocation is 
henceforth referred to as the generation-time ordering. 

Now in order to develop a queueing model, we need a framework to capture the data aggregation operations. As mentioned 
before, our primary goal is to explore the benefits of in-network computation versus data-download. To this end, we restrict our 
attention to a specific class of functions, which we define as the FMux functions, and for which we can exactly quantify the 
gains from in-network computation. The intuition behind the FMux class is that these functions support maximum compression 
upon aggregation; when two (or more) packets combine at a node, the resultant packet has the same size as the original 
packets. We now define it formally. 

Computation Model: We assume that the function / is divisible |[T|. Formally, we assume that each sensor records a value 
belonging to a finite set X, and we have a function / of the sensor values that needs to be computed at the aggregator a. We 
use fk to denote the function / operating on k inputs, i.e., fk : — > 7?,(/, k), where ??.(/, k) denotes the range of function 
when it takes k inputs. Then the function / is said to be divisible if: 

1) |72.(/, fc)| is non-decreasing in fc, and 

2) Given any partition Ii{S) — {Si, S2, ■ • • , Sj} of S C [n], there exists a function g^{-) such that for any x e X^: 

/(Xs)=ff"(/(XSJ,/(X5J,...,/(X5,)). 

Intuitively, for any partition of the nodes, / can be computed by performing a local computation over each set in the partition, 
and then aggregating them. 

We define a function / to be Fully-Multiplexible or FMux if TZ{f, k) = Tl{f,j) = for all j, k <E [n]. In other words, 

the output of a FMux function lies in the same set independent of the number of inputs. Some important examples of FMux 
functions are MAX, k-th order statistics, parity, etc.. As mentioned before, in this work we will focus on FMux functions as 
they most clearly exhibit the effects of in-network computation (in that we have tight bounds for their refresh rate). 

As a representative example of FMux functions for defining the queueing model and algorithms, consider computation of 
the parity of the sensor readings: X — {0, 1}, f{{xi, X2, ■ ■ ■ , x^}) = xi © 2:2 © ... © a; at, where © represents the binary XOR 
operator. Upon sensing, node i stores the value Xi as a packet of size log | A"! 1 bit. Next, when two or more packets of the 
same round arrive at a node, they are combined using the XOR operation. Finally, the aggregator obtains the parity by taking 
XOR of all the packets of a given round that it receives. We now develop a queueing model for FMux functions. 
Queueing Model: Each node maintains a queue of packets corresponding to different rounds. For node i, Qi [t] E 2^" is a 
subset of No representing the round numbers of all packets queued up at that node. We also define Qi[t] = \ Qi[t]\. 

When a packet corresponding to round r arrives at node i from any other neighboring node, it is combined with node i's 
own packet corresponding to round r to result in a single packet of the same size (using the FMux property in general, e.g. 
by taking XOR for parity). In the case where node i does not have a packet of round r in queue, it needs to store the new 
packet. Formally, upon arrival of packet of round r in time slot t (and ignoring other arrivals and departures), the queue is 
updated as follows- 

where we use rl^r^Qilt]} as shorthand for 'r if r ^ Qi[t], else </)'. The complete queue update in a time slot is obtained by 
extending this definition for all arrivals, and by removing any departing packets from the queue. 

Suppose further that the round number allocation is done according to the generation-time ordering scheme described before, 
then the system described above forms a Markov chain under any stationary scheduling and routing algorithm. Further, it can 

^Note that this assumption is not the most general possible restriction on the input process, but one that we choose for convenience of exposition. For more 
general conditions on the arrival process, refer to [131. 
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be showed that this chain is irreversible and aperiodic. We will now focus on the above queueing dynamics for the design of 
scheduling and routing algorithms. 

We should note here that the queueing model we have described above accounts only for routing and aggregation of packets 
belonging to the same round. We have not allowed packets from different rounds to be combined together in any way, thereby 
negating the possibility of block coding and network coding. In the case of parity, it is known however that there is no 
improvement possible by using schemes with block/network coding yj, \29^ . 

III. Maximum Refresh Rate and Tree Packing 

Given the above queueing model, it is unclear what routing structures are required for efficient in-network computation. 
Existing routing-based approaches for function computation |8|, |26|, |30| often assume that routing is done on a single 
aggregation tree, where each node aggregates data from its children before relaying it to its parent. However it's not a priori 
evident that a single optimal tree, or a collection of optimal trees exists (or indeed that acyclic aggregation structures are 
sufficient), and if it does, how it can be found dynamically. 

In this section, we derive an algorithm-independent upper bound on the refresh rate for FMux computation. By focusing 
on the flow of information from sensor nodes to the aggregator, we are able to express the bound in terms of an underlying 
graph primitive- the min-mincut of the graph. Next we construct a class of throughput-optimal randomized policies, thereby 
obtaining a tight characterization of the maximum refresh rate. In the process, we show the existence of an optimal collection 
of aggregation trees. To understand the import of this result, consider the following example. 



Example 2: Consider a wired network G consisting of the complete graph on N nodes, with every edge having capacity 1. If 
we use a single aggregation tree for routing, then the maximum possible refresh rate for computing the parity function is 1, as 
every edge is a bottleneck. However, by using a collection of aggregation trees (in fact, it can be shown that a particular set 
of — 1 trees are sufficient), one can achieve a refresh rate of — 1. This, as we will show in the next section, is optimal 
as it turns out to match the min-mincut of the graph. 



Keeping this in mind, we now characterize the maximum refresh rate for FMux computation in general graphs. 

A. An upper bound on refresh rate for FMux computation 

We now state an upper bound for the refresh rate under which the network can be stabilized under any algorithm. We state 
this theorem for wireless networks, as an equivalent theorem for wireline networks can be obtained as a special case. 
Given a rate vector c G C'H(r) and any node i e M, we define the min-cut between the node i and the aggregator a as: 

5i{c) = min 

Further, we define the min-mincut of the network under rate vector c e CH{r) as 

S*{c) — mmSi{c). 

Then we have the following lemma. 

Lemma 1. Upper Bound on refresh rate: Consider a network performing FMux computation. A refresh rate of X can not be 
stabilized by any routing and scheduling algorithm if 

A > (log|7^(/)|)-l max ,5*(c). 

We note here that the capacities of the links are given in bits per time slot, while the refresh rate A is in terms of packets 
per time slot. The (log |7?.(/)|)^^ factor is to convert link capacities into packets per time slot, and is henceforth present in 
all bounds for the refresh rate. 

Proof The proof follows from tracing the steady state flow of packets from any sensor node to the aggregator. More 
specifically, for a refresh rate A, suppose the network is stabilized by some algorithm. Then the Markov Chain described by 
the packets in the network (under the generation time ordering round number allocation, as described above) has a stationary 
regime. Further, due to the network constraints, the average service rate on each edge of the network in the stationary regime 
is given by some c e CHiT) (in bits per slot) 

Next under the stationary regime, for a sensor node i e M, we can trace the packets as they travel from node i to the 
aggregator (in order to do so, we start tracing a packet when generated at i, and subsequently whenever that packet is aggregated, 
we trace the aggregated packet). Now for every directed path from i to a, we obtain an average flow of packets which travel 
along that path. This gives us a set of flows from i to a. Due to the unchanging packet size (due to the FMux assumption), the 
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sum of these flows is equal to A. However, due to the network constraints, the sum of flows on an edge {u, v) is less than or 
equal to and thus by the max-flow-min-cut theorem, A is less than or equal to the minimum i — a cut with edge capacities 
given by c. Now since this is true for any node i, we have that A < (log \TZ{f)\)^^S* (c). Maximizing over aU c e CH{r), 
we get our result by contradiction. ■ 



B. An optimal class of randomized scheduling/routing policies 

From Lemma[T[ it is evident that the min-mincut of the graph (under an appropriate SSS rule) is the bottleneck for computing 
an FMux function. Now we can use a classical theorem of Edmonds in order to simplify the space of policies we need to 
consider. We state the theorem in its original form whereby it is applicable to a one-to-all network broadcast scenario (informally, 
a directed graph with a special source node, where the aim is to transmit the same packets from the source to all the nodes in 
the network). However given a sensor network, we can apply Edmonds' Theorem on it by reversing the directions of all the 
edges while keeping their capacities the same. 

Consider a directed graph G{Af, C) with a distinguished source node s E J\f, and suppose each edge (u, v) E C of the graph 
is associated with a capacity Cuv > 0. As before, the min-mincut of the graph G is defined as: 

S*(G) ~ min min > Cuv 

'ie7V\{s} scAf-.ies.s^s ^ 

Let T to be a set of all spanning trees of G rooted at s (i.e., every < e T is a spanning tree with s as the first element in its 
topological order). The max-spanning-tree-packing number, A*(G) is defined to be the solution to the following optimization 
problem: 

Maximize Srer 
subject to 

Ar < Cuv V(i,j) e £, 

rgT:(«.u)ST 

Xr>o vt € r. 

Then we have the following theorem. 

Theorem 1. (Edmonds, 1972 y|37p For a directed graph G{J\f, C) with distinguished source vertex s and edge capacities 
Cuv, {u,v) e C, the min-mincut S*{G) is equal to the max-spanning-tree-packing A*(G). 

Edmonds' Theorem guarantees the existence of a tree packing which has the same weight as the min-mincut of the graph. 
Now in the case of one-to-all broadcast in networks, wherein a node can transmit copies of any packet it has received, it is 
clear that the subgraph traced out by a packet in reaching all nodes forms a tree. Returning to the wireless setting, we now 
sketch out how to construct a randomized routing and scheduling algorithm that is throughput-optimal, using the technique 
developed by Andrews et al |13|. Suppose we know the point c* e CH(r) in the obtainable rate region which maximizes the 
min-mincurl then we can schedule according to the corresponding SSS rule to achieve an ergodic rate of c*„, across any link 
{u, v). The network is now converted into a wired network, i.e., with edges having fixed capacities. Next we can use Edmonds' 
Theorem to obtain a tree packing for this fixed-capacity network, which determines how the input flow should be balanced 
between spanning trees. Combining these two steps, we obtain a scheme whereby we split the incoming flow according to the 
tree packing, and schedule using the SSS rule corresponding to c* to stabilize the system. By a similar argument, we can obtain 
a tree packing given the optimal SSS rule for the FMux function computation problem. Here each round is associated to a 
spanning tree such that the total incoming flow (which is equal to the refresh rate) is split according to the above tree packing. 
This tree is henceforth referred to as the aggregation tree of the round, and determines the route followed by the packets in 
that round. The routing thus taken care of, the scheduling is done according to the optimal SSS rule, and in combination, they 
stabilize the network. Combined with Lemma [T] this gives a tight characterization of the maximum refresh rate of the network, 
which we state in the following theorem. 

Theorem 2. Consider a network performing in-network computation for an FMux function f. The maximum refresh rate is 
defined as: 

X* = (log|7^(/)|)^l max S*(c). 

Then a refresh rate of A can not be stabilized by any algorithm if X > A*, and there exists a static, randomized algorithm to 
stabilize it if X < A*. 

We note that this bound, and the definition of FMux functions, is similar in spirit to the results in ||6|. Theorem [2] is different 
(and more general) than the results obtained in |^|, both in scope and technique. More generally, there is a fundamental 

^Note that such an optimal rate point exists as the min-mincut is a continuous function of the rates, which lie in a compact set C'H(r) 
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difference in the level of abstraction with which we view the problem vis-a-vis other similar works such as ||5)-||7), where 
the focus is on the physical/link layers, and further, only for wired networks. Our result is for network layer algorithms for 
a more general class of networks (wired and wireless); furthermore, the algorithm based on SSS rules is an explicit (albeit 
static) algorithm, and uses only routing and packet aggregation at nodes. In contrast Appuswamy et al. |6| use results that show 
the existence of source-coding based, schemes (which are more complex than routing based schemes we use) that achieve the 
min-mincut in noiseless, wired networks. 

The problem with such a static algorithm is that it needs prior calculation of the min-mincut and associated optimal rate 
point (to obtain the optimal packing of aggregation trees). A better alternative is to use the queues as proxy for learning 
these through dynamic algorithms based on the current system state (similar to the Backpressure algorithm 1 ,12) , | ,13J for data 
networks). The rest of the paper deals with the development of such algorithms. 



IV. Routing with Random Packet Forwarding in Wired Networks 

In this section we give a routing algorithm for acyclic wired networks based on random packet forwarding with aggregation. 
This algorithm is based on an algorithm for one -to-all network broadcast in wireline networks by Massoulie et al. p8) , which 
demonstrates that random 'useful' -packet forwarding achieves the min-mincut bound. We modify their approach to obtain a 
dual version applicable to FMux computation in wireline networks. 

In in-network FMux computation, as described before, a new round of packets arrives at all sensor nodes in a synchronous 
manner, and need to be routed to the aggregator For the broadcast problem (where packets arrive at the source and need to be 
routed to all other nodes), an optimal algorithm [[28) is as follows: for any idle link in the network, the source node randomly 
picks a packet that the receiver does not have (defined as a 'useful' packet) and transmits it on that link. We now define an 
analogous notion of a useful packet for in-network aggregation, and show how it can be used to derive an optimal random 
routing algorithm for FMux function computation. 

A natural invariant in broadcast is that the trace of a round of packets always follows a spanning tree. This is not in general 
true in aggregation; however in the case of acyclic networks, one can impose additional constraints to ensure that a transmission 
does not lead to an isolated packet, i.e., a packet at a node such that no neighboring node has a packet from the same round, 
thus preventing its aggregation. This can be ensured by defining an appropriate notion of a 'useful' packet and only transmitting 
useful packets. We define a packet in node i to be useful to neighbor j if (a) j has a packet of the same round (hence ensuring 
aggregation); and (b) transferring the packet to j does not result in an isolated neighbor k of i. The routing algorithm now 



consists of randomly forwarding useful packets whenever a link is idle. In Appendix VI we prove that this definition leads to 
packets being routed on spanning trees. 

Formally, the algorithm is a work-conserving policy whereby each node i E JV ensures that an outgoing edge (i, j) G C is 
engaged in a packet transfer if and only i/ there are packets in i that are useful to j. For a node i, we define N~{i) — {j g 
Af : (j, i) e £} and iV+(i) = {j E Af : € C} to be the 'in-neighborhood' and 'out-neighborhood' of i respectively. Now 
at a given time t, packets of a round r can be in 3 states under the algorithm (analogous to the notation Massoulie et al. |j28)): 

1) Sucessfully aggregated, i.e., present only at aggregator a. 

2) Idle, i.e., not being transmitted at any edge. Packets of an idle round r are present at nodes of some set S C Af, henceforth 
called the footprint-set of round r, and denoted FPr[t]. We define a valid footprint-set to be one where the subgraph 
induced by the set contains a spanning tree rooted at a (equivalently, each node in the footprint set has a directed path 
to a); the collection of such sets is denoted as S. Finally, for all S S S,Xs[t] = is a count of idle rounds located in S. 

3) Active, i.e., being transmitted on at least 1 edge. The collection of active rounds is given by A[t] = {Ri [t] , i?2 [^] , • ■ • , Rm [t] }, 
where round each round Rk[t] has an associated pair {FPk[t], Ek[t]) E S x 2'-'; here FPk[t] is the footprint-set, and 
Ek [t] C £ is the set of edges on which packets of round Rk [t] are being transmitted. 

The pair ({X5[/:]}sg5, ^[t]) forms a complete description of the system; we henceforth consider the Markov Chain on this 
system description for describing and analyzing the algorithm. Further, for ease of exposition, we supress the dependence on 
time whenever clear from context. 

Now we can formalize the notion of a useful packet for transmission. We define an edge {u, v) to be idle if {u, v) ^ ErVr e 
A[t] (i.e., no packet it being transmitted on it). For a given idle edge (u,v) at time t, a packet of round r (idle or active) is 
said to be useful if: 

1) (Aggregation Condition) Both u and v are in FPr[t]. 

2) (Non-isolation Condition) For all w G i^P,.[t]nA^^(u), there is an alternate route for aggregation, i.e., \FPr[t]r\N^{w)\ > 
2. 

Figure [T] illustrates the above conditions for determining whether a packet is useful with respect to a link. Note that the 
definition of valid footprint-sets is consistent with definition of useful packets: by ensuring transmission of only useful packets, 
we ensure that the footprint of any round must be a valid footprint set (i.e., always containing a spanning tree rooted at a). 

Next we impose a work conservation requirement on the system in the following manner. Define X^^-v — '^ses-veS u<^s -^s+u 
to be the number of useful idle packets across edge {u, v), and similarly to be the number of active packets at u which 

are useful to v. Then we impose the following activity condition on the network-V {u, v) E C, one of the following is true: 
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Fig. 1: Illustration of notion of 'useful' packet: For the single round in the above network, packets are not useful for: (a) 
Link (2, 1) because of violating the aggregation condition (node 1 has no corresponding packet), (6) Link (3, 0) because of 
violating the non-isolation condition (node 2's packet gets isolated), (c) The packet is useful for link (2, 3). 

i . M e for some {W, F) e A 

ii. = 0,X"„_j, = 0, 

or in words, an edge is active as long as there is at least one useful packet across it. We now describe the routing algorithm, which 
performs random useful packet forwarding with aggregation while ensuring the activity condition. The routing is performed 
whenever a link is idle. 

Input: An idle link {u, v), i.e., a link with no packet transmitting on it currently. 

Output: A routing decision of which packet to transmit on {u,v). 

Step 1: If $ useful packets across {u, v), leave link idle. 

Step 2: Otherwise, pick a useful packet uniformly at random and start transmitting. 

Algorithm 1: Random useful packet forwarding with aggregation for FMux computation in wireline networks. 
And finally we have the main theorem for the stability of the algorithm. 

Theorem 3. For a directed acyclic network operating under algorithm^ the network is stable if X < (log \TZ{f)\)^^S*, where 
S* = minses J2ves Eu^5 ^uv 

The proof closely follows the proof of Massoulie et al. p8) , with appropriate modifications in order to perform aggregation 
rather than broadcast. Similar to ["281, it proceeds in three stages- 

1) Defining the fluid limit of the Markov chain, and associated convergence results. 

2) Defining a Lyapunov function for the fluid system, and showing negative drift. 

3) Using the fluid Lyapunov and convergence results to show stability of the original system. 

The critical additions that we make are in the appropriate definition of a useful packet, and in identifying the appropriate 
counter variables that capture FMux aggregation in networks. Further, in Lemma |4j we derive a crucial combinatorial relation 
between these counter variables parallel to the main lemma in |28|. The details of the proof are provided in Appendix A. 

V. Scheduling With Aggregation-Tree Routing in Wireless Networks 

The presence of interference in wireless networks necessitates efficient scheduling of independent sets in addition to routing. 
Given an SSS rule, we can modify Theorem |3] to show that random packet aggregation supports a rate upto the min-mincut 
under the corresponding SSS rule. However dynamic scheduling in order to achieve the optimal SSS rule tt* (i.e., with the 
largest min-mincut) needs an alternate routing technique. 

We now describe an alternate approach to throughput-optimal dynamic scheduling and routing for in-network FMux com- 
putation over wireless networks. Unlike wired networks, where routing was performed via random packet forwarding, we now 
focus on schemes based on pre-allocating the route to be followed by the packets of each round, and then scheduling under 
these routing constraints. Building on the intuition that the "correct" routing structures for FMux computation are spanning 
trees rooted at the aggregator (henceforth refered to as aggregation trees), we split the algorithm into two components: 

• A routing component that maps incoming rounds to aggregation trees. Once a round is assigned to a tree, its packets 
follow the edges of the tree to the aggregator 

• A scheduling component uses the knowledge of the next hop of each packet to determine an optimal independent set for 
transmission. 

The main result of this section is that there is a dynamic algorithm of this type that is throughput optimal for wireless 
networks. More specifically, we present a throughput-optimal algorithm based on 'greedy' routing (whereby the aggregation 
tree is chosen in a greedy manner) and 'Max Weight' -type scheduling (whereby links are scheduled according to a maximum 
weighted independent set problem, with link weights determined by the queues). 
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Before presenting the general algorithm, we consider some specific example networks to give an intuition as to how the 



algorithm is constructed; in particular we illustrate the scheduling and routing components separately. Finally, in Section V-B 
we present the complete algorithm for general graphs, and prove its throughput-optimality. 



A. Scheduling With Aggregation-Tree Routing for FMux Computation: Preliminaries and Some Examples 



In this section we give some examples to build some intuition for the general algorithm we present in Section |V-B Suppose 



the network is a tree rooted at node a. For node i e M, we define p{i) to be the (unique) parent node and C{i) to be the set 
of immediate children nodes in the aggregator tree. Before specifying the queueing dynamics for this system, we first need 
a lemma that reduces the space of all possible scheduling policies to a smaller set of policies for which we can write the 
dynamics in a convenient manner. 

A scheduling policy for tree aggregation is said to be of type aggregate and transmit or Type-AT if for every node i, and 
every round r, a packet of round r is transmitted from i to p{i) only after receiving the corresponding round r packet from 
every node j E C{i). A Type-AT policy thus prevents a round from being transmitted to its parent until it has aggregated all 



corresponding packets from its children-this is analogous to the non-isolation requirement in Section IV Further, this ensures 
that the flow on each edge of the tree is equal to the input rate of packets on that the tree. Henceforth, we restrict to Type-AT 
policies, which are sufficient by the following lemma. 

Lemma 2. For an aggregation tree and a scheduling policy that stabilizes the system for given refresh rate A, there exists a 
scheduling policy that stabilizes the system for the same refresh rate, and in addition, is of Type-AT. 

Proof Given any stabilizing policy, we can use a standard coupling argument to obtain a scheduling policy of Type-AT. 
Whenever the policy transfers a packet violating non-isolation, the modified algorithm stores the packet at the same node. This 
continues until the node has received all packets of that round from its children nodes. Now the next time the policy transmits a 
packet of the same round from that node (which we know happens as the algorithm is stable), the modified algorithm transmits 
the aggregated packet. However since each round starts off with \M\ packets, this means that the number of packets under 
the modified algorithm is less than or equal to \M\ times the number of packets under the non Type-AT algorithm. Since the 
original policy is stable, hence the modified policy is also stable, and is of Type-AT. ■ 
We now consider some example networks with N nodes, where the aggregator node a desires a function of the sensed data. 
Assume that each sensor node records a value from an ordered, finite set X, and the aggregator wants the MAX of these values 
(an FMux function). The computation at the nodes consists of taking all available packets of a given round, and retaining the 
one with the largest value. In the following examples, we focus on the routing and scheduling aspects of the problem: first 
we study how to schedule links to deal with interference under a single aggregation tree; next we allow for collections of 
aggregation trees with fixed flows and show how to mix flows across these trees; finally we show a simple example of how 
dynamic routing over many trees can be achieved. In the next section, we combine these to obtain a dynamic scheduling and 
routing algorithm for general network topologies. 



Example 3 (Single Aggregation Tree): Consider a sensor network where the MAX is computed by combining data on a single 
aggregation tree. We now modify the queueing model of Section |Il] to ensure that a policy is Type-AT. Each node i maintains 
two queues: Qf"[t] corresponding to 'not-useful' packets which are awaiting packets from C{i) with the same round index, 
and corresponding to 'useful' packets which are ready for transmission to p{i), having received and calculated the MAX 

of all corresponding packets from nodes in C{i). We also define Q,[t] = [i] Q Q™'[t], and use Q:^[i], "[t] and Qi[t\ to 
denote the cardinality of the appropriate queues. 

Packets entering the network at node i at time t are stored in Q""[i] except in leaf nodes where they are stored in Q"[t]. 
A node only transmits packets which are in Q"[t] in order to ensure that the policy is of Type-AT. When node i receives 
packets corresponding to round r from all nodes in C{i), it retains the packet with the maximum value and stores it in Q"[t]. 
Formally, we can write the queue dynamics as: 

Here I?(ip(j))[i] represents the packets transmitted from node i to its parent in time slot t and Ii[t\ denotes the internal 
transfer of packets at node i from unaggregated to aggregated (r € Xi[t\ if r € Z?(ji)[t] for at least one j G C{i) and 
r ^ Uj(zc{i) Qj [t + 1]). The cardinality of I'(i,p(i)) [t] is henceforth denoted as D^pi^i-j [t] which represents the number of packets 
transmitted over link {i,p{i)) in time slot t. 

One observation regarding these dynamics is that unlike data networks, under Type-AT policies, a packet transmission by 
node i does not change the total size of its parent's queues Qp(^i) [t] (this is in general due to the FMux property). Further, each 



10 



unaggregated round in the network has a useful packet at some node. Thus, we obtain the following scheduling algorithm, 
which is a modified version of the Backpressure policy fT2| to account for these facts: 

Input: Time slot t, queue states {Q"[t], Q""[t]}ieAA, incoming packets Ai[t], admissible rate region F 
Step 1: Place incoming packets to sensor i in Q""[i] for non-leaf nodes, and Qf[t\ for leaf nodes. 
Step 2: Compute c*[t] as: 

c*[t] = argmax V g"[t]c,p(,)[i]. 
Step 3: Consider node i. If c*p(.j-j[i] > and Q"[t] > 0, then transmit the first -Dip(i)[t] packets, where 



The above example indicates how the algorithm chooses independent sets for a single class of packets. Next we consider 
a network which uses a collection of aggregation trees for routing, therefore requiring the algorithm to make an additional 
decision of which packet to transmit on a scheduled link. 



Example 4 (Multiple aggregation trees): Consider a network modeled by a directed graph where we restrict the routing to a 
specified collection of aggregation trees. We assume that each tree r has a pre-determined arrival rate A^- of rounds on it. Each 
new round is associated with a given tree in accordance to the arrival rates, thereby completely specifying the routing. In each 
time slot, flows from different trees can be scheduled for transmission. We first need some additional notation. 

Let T be the set of spanning trees that are used for routing. Each incoming round is tagged with a specific aggregation tree 
T £ T, which specifies the route to be followed by packets of that round while calculating the MAX at each node. Define 
p"^ (i) , (i) to be the parent and children nodes of node i on tree r. Also, define A[t] — X^Ter^lW ^^'^ ^ — J^reT-^T 
to be the given splitting of input traffic between the aggregation trees. The queueing model is an extension of the previous 
model. Each node i maintains two queues for each tree r: Q['"" [t] corresponding to unaggregated packets which are awaiting 
packets from C^(i) with the same round index (not-useful), and Q['"[t] corresponding to aggregated packets which are ready 
for transmission to (useful). We use Q'^'^[t] and (5p""[i] to denote the cardinality of the appropriate queues. The queue 
update equations are similar to before. 

The scheduling algorithm for this network is similar to the single aggregation tree, with the added step that the weight of a link 

Input: Time slot t, queues {Q^'^ 
Step 1: Place incoming packets ; 
Step 2: Calculate Pij[t] = max^, 
Step 3: Compute schedule c*[t] 

is now given by the maximum queue backlog over all queues competing for that link. Formally we have: 



Step 4: Consider link If c- 



The above two examples indicate how the scheduling algorithm works when the routing is specified. As we mentioned 
before, the routing component of the algorithm assigns incoming packets to aggregation trees. The challenge is to do so in a 
dynamic manner, i.e., to route the packets based on network state alone, and not using pre-computed rates for each tree. As 
we mentioned before, this routing decision is made in a 'greedy' manner. In the next example, we consider a simple network 
to illustrate this. 
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Example 5 ( Complete Graph): Consider a network which in the form of a complete graph of N nodes labelled {0, 1, ... , N~ 1} 
with node denoting the aggregator node (which again wants to calculate the MAX value of the data at all the other nodes), and 
with each link having unit capacity. As we claimed earlier, the min-min-cut of this network can be achieved by packing — 1 
aggregation trees. In particular, consider the set of depth 2 trees {t^}^^^, where tree Ti consists of nodes {1,3,..., A^— l}\{i} at 
the bottom level connected to node i which is connected to the aggregator, i.e., node (for example, consider the decomposition 
of a 5 node complete graph in figure 2). These trees are clearly edge-disjoint and hence they can each support a load of 1 to 
achieve a tree packing of A^ — 1 (as for each edge of the graph, there is a single such tree which traverses it. Since all the 
edges have equal capacity, therefore putting unit capacity on each tree gives us a feasible packing). Hence they are optimal. 

There are two ways to route packets on these trees. Since we know that the optimal load on each tree is 1 unit, we can 
associate each incoming round of packets to tree Ti with probability jyz^. Alternately, when a new round of packets arrives, 
we can load it on the tree ti that has the least total number of packets on it. Intuitively this scheme also asymptotically achieves 
the appropriate load balancing. In the next section, we formalize this notion of 'greedy' tree-loading for general graphs, and 
further show that it indeed does achieve the optimal tree-packing. A more subtle point is that we may not a priori know the 
correct trees to route on (unlike in this example), and a surprising result is that it is sufficient to perform greedy tree-loading 
over all aggregation trees and still remain throughput-optimal. 



B. Scheduling With Aggregation-Tree Routing for FMux Computation: The General Algorithm 

Finally we present the complete dynamic algorithm for FMux computation. The algorithm separates the routing and 
scheduling components as follows: when a round of packets arrives in the network, we first 'load' all packets of the round on 
an aggregation tree (thereby fixing the routing); next, in each time slot, scheduling is done according to a modified MaxWeight 
policy. 

The routing is performed using a greedy tree-loading policy, wherein all incoming rounds in a time slot are loaded on the 
tree with smallest sum useful-queue, i.e., least number of useful packets. Formally, we have: 



Input: Time slot t, queues Ql'^ltWieM .reTi incoming rounds. 

Output: A routing decision associating each incoming round with a tree t E T- 
Step 1: Calculate Wr - J2^eJ^ (Ol'^W) ^r all r e T. 
Step 2: Find the minimum loaded tree r* [t] as: 

T*[t] = arg min Wt [t] . 
Step 3: Assign all incoming rounds to aggregation tree T*[t]. 



Algorithm 2: Greedy tree-loading algorithm for FMux computation. 
The scheduling algorithm is similar to the MaxWeight policy |13|, in that it picks a maximum independent set with weights 
given by the product of the rate and the maximum queue across an edge. Formally we have the following algorithm: 



Input: Time slot t, queues Q['" [^llieAA. tgT' incoming packets Ai[t], admissible rate region F. 

Output: A scheduling decision p^^j)-) [i] | 

Step 1: Place packets arriving on tree r at node i in for non-leaf nodes, and Q['"[i] for leaf nodes. 

Step 2: Calculate Pij[t] — max^g7-.(j ^^g^ (5p"[t]. Also define T*{i,j)[t] as the tree which maximizes Pij[t]. 
Step 3: Compute schedule c*[t] as: 

c*[t] — arg max Pij [t] Cij [t] . 

Step 4: Consider link If c*j[t] > 0, then transmit the first min(c^ [t], Q['^''^'^''l'"[t]) packets from Q['^''^^^*l'""[<] 



Algorithm 3: MaxWeight scheduling algorithm. 
For the sake of completeness, we note that in all the above algorithms, tie-breaking rules as well as the service discipline 
(i.e., among a set of multiple packets suitable for transmission, which one gets priority) are assumed to be random; this is 
done for the sake of convenience, and we note that there are many possible tie-breaking rules and service disciplines which 
would suffice. 

Now we can state and prove the throughput optimality of this algorithm. 

Theorem 4. The dynamic queue based policy consisting of greedy tree loading (Algorithm Q and MaxWeight scheduling 
(Algorithm^ stabilizes the system for any refresh rate A that is less than the maximum refresh rate A*. 

Before proceeding further, we point out a particular novel aspect of the proof of this theorem. Similar to previous papers 



12), 1 13 1, we use a quadratic Lyapunov function for showing stability; however our technique for bounding the Lyapunov drift 



is quite different from those used for point-to-point data. The difficulty arises from the fact that although Edmonds' Theorem 
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guarantees the existence of an optimal tree-packing for the network, the trees in this optimal packing are unknown to the 
algorithm; consequently it is unclear whether routing over all trees could lead to instability via packet accumulation on trees 
not involved in the optimal packing. We circumvent this by showing the existence of some intermediate tree packings between 
the optimal and the desired refresh rates, which allow uniform bounding of the Lyapimov drift. We now present the complete 
proof. 

Proof: We define a candidate Lyapunov function V[t\ as 



with corresponding Lyapunov drift given by 

A^[i]=E[y[i + l]-y[t]|Q[i]]. 

Similar to before, we have that V[i] > for all states of the system, and that Av[t] < oo. We now need to show that given 
S > 0, there exists Qmax such that if Ql'^[t] > Qmax for some i, then Ay[i] < —5. Now we have 

and defining Ap" [t] to be arrival of useful packets on tree r to node i, we have 

AQ[[i]=A['"[i]-D[.^.(,))M, 

and thus (A(5[[i])^ < TO^+(Lcmax)^ (due to external arrivals plus inter-node transmissions). Let M2 = iV|T|(mA+(Lcniax)^). 
Then we have 

Av[t] <M, + 2J2J2 QI'^ME [Ant] - l)[,,p.w)M|QM . 

From the definition of A*, we know that there exists an optimal rate point {c*„}(„,„)g£ G C7^(r) and the corresponding 
optimal SSS rule tt* that maximizes the min-mincut. Consider now a refresh rate A less than the A*, such that A* — A = e > 0. 
Note that the algorithm can potentially split the incoming flow A over every spanning tree of the network, in order to 
dynamically arrive at the optimal packing. To uniformly bound the Lyapunov drift, we first need to construct two tree packings: 
an 'achievable' packing {A^} such that X^tgT'^t' ^ ^ which serves as a proxy for the flow-splitting, and a 'near-optimal' 
packing {Ax} such that X^^er > A* — ^ and further which has the property that A,- — A^ > €4 uniformly over all spanning 
trees (for some £4 > which we define below). We do so as follows. 

Assume that there exists Cy^in > such that if any edge {u, w) G £ is scheduled alone (i.e. /' = {u, v)), then Cuv{I') > Qnin 
(this is simply a formal definition of existence of a link). We can now perturb the optimal SSS rule to get a new rate point 
{cuv}(u,v)ec € C^(r) with the following two properties: 

1) Every edge (u, v) € £ has capacity c^,. > ei > 0. 

2) The min-mincut of the network at the rate point {cuv}{u,v)ec is > A* — |. 

This helps ensure that the 'near-optimal' tree packing A^ can have some mass on each edge of the graph. 

To construct the perturbed SSS rule it, consider die optimal SSS rule {7r*(7)}7e2:. We define T = {I G I : 7t*{I) > 0} 
(i.e., the set of independent sets that have some mass under tt*) and TTmin = min7gx'{7r*(J)} (which is > as the cardinality 
of |I| is finite). Now we reduce each 7r*(/), / € X' by e2 = minj^^^, ^^^^^^ }■ This reduces the min-mincut by at most |. 
To see this, note that the capacity of any edge {u, v) reduces from c* to c„„ where: 

Cuv > C*„(l - £2), 

f 

^ * _ ^utj*^ 

^ ^uv o\r\r ' 
'-'I*-'! '-'max 

>c* - — 
- 3|£| ■ 

Further, the maximum number of edges across the min-mincut is bounded by L. Thus the min-mincut of the network at the 
rate point {Cuv}(u,v)ec is > A* - |. 

Next, suppose C is the set of edges with zero flow under tt* . We now complete the definition of the perturbed SSS rule tt 
(using the fact that singleton edges are valid independent sets) as follows: 




€2 



/el', 

I = {{u,v)}\/{u,v)GjC', 
otherwise. 
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To see that this is sl vaUd SSS rule, note that 1 — ^^/f^x' ^("^) — l-^'l^27 which is the weight we have distributed equally over all 
links in C. The rate point under this SSS rule is henceforth denoted as {c„i,}. Then for edges in C we have c„„ > l^/*^""" ■ 
Now since there are only L edges, each with positive capacity Cuv, therefore there exists some ei > such that every edge 
{u,v) € C has capacity c^v > ei under SSS rule n. Finally, applying Edmonds' Theorem (Theorem [TJi on the network under 
7T, we get a packing {X* {■K)T-}TeT such that we have 

^A;(7r)>A*-^. 
Before proceeding further, we need the following definitions: 

. C* ^ {{U, v)eC: C„„ - Er:iu,v)er Ki^) = 0}. 

• T* = {reT:Ki^)>0}. 

• €3= min(„,„)e(£.)c{c„^, - Er:(u,i,)er Ki^^)}- 

• Amin = min^g-7-.{A;(7r)}. 

Note that £3 > as c„« > ei and the packing is not tight on the finite set Similarly, Amin > 0. 

Finally we can construct the tree packings (on the network under SSS rule tt) that we need to bound the Lyapunov drift: 
1) The 'achievable' tree packing, {A^}rer is defined as: 



a: 



;{a;(^)-^,o} 



max<; A^iTT) - ;jT^,U ^ : r e T*, 
[0 :t(^T*. 
Then clearly A^ is a packing (as we are only removing mass from a valid packing) and further: 



Kin) 



2e 



>A*-e = A. 



2) The 'near-optimal' tree packing, {AtItst is defined as: 
A. ' 



3|r-| 
A*(7r) - V 

min I ; 



rer*,A;(7r)< 3^, 



2|(r*)-|' 3|r*||(r*)<=|' |(r* 

First we need to show that this is a valid tree packing. To see this, note that the maximum load added on any edge is 

, eaj (since in the worst case, all the trees in {T*Y can contain some edge). For any edge 



bounded by min | - 



2 ' 3|r* 

in {C*Y, this is less than the slack (> ej, by definition) that was akeady present. For an edge in £*, we know at least one 
tree in T* contained it (as every edge in the graph has positive capacity under the SSS rule n), and hence we subtract a 
load of at least min | 2|(r*")'=| ' 3\T'-'\\{T')''\ }' which is again greater than the amount of load we add. Thus {At-ItsT is a 
valid packing. 



2|(r-)<=|' 3|r-||(r*)i 
Further we have that Y.reT^'^ - T^reT- > A* - 
In addition, defining 

1 ^ Aniin All 
' — ™in < — — — 

\3|r 



£4 



£3 



2 ' 2\{T*Yy ■i\T*\\{T*Yy \{T*Y 



we get that A^ - A'^ > £4 V r e T. 

Thus we have constructed the two tree packings we need. We now return to bounding the Lyapunov drift. From above, we 
have 



mm 



Now, let c 



\t\ be the rate for packets on aggregation tree t on link {i,p'^{i)) allocated by the policy in time slot t (thus 



[t] 



EE 



[t]). Then we have 



= EE' 
^EE' 
^EE' 



t]-max{c[p.(,)M-grM,0}|QM 



[t] — maxjcii 



Qr[t]M\m 



,2 

max' 
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Further, from the definition of the policy, we know that 



= E max ^ max {Ql'"W} c,,[i]|Q[i] 

cer ^ Ter:(j,j)e-r 

> max V max {Q]"'"[i]}E [q.MIQM] 

(i,j)eCTeT ■.{i,j)£T 
(where cjj[t] is any tree-packing of a given c e C'H(r)) 

> E E erM^^r 

(j,j)e£rer:(j,j)e-r 

where for any edge cjj represents any vahd split of Cij between trees lying on that edge, i.e., c e CH{r). In particular, 
therefore, we can use the tree packing {At-}t-£7- to get 

Combining inequahties, and defining M3 = N\T\ {mA + (Lcynax)^ + Lc^^^) , we get 

Finally, define AJ [t] to be the rate of rounds arriving on tree r. Then from the greedy round-tree assignment algorithm, and 
using the fact that each roimd results in exactly one useful packet at each node, we get 



mm 

{A-[t]}:E,er^"[*l=^[*l 



E E^rM Wiw 



Q[t] 



<EEQrw^x- 



Thus we get 



A^[i]<M3-2^^QP"[i](A.-A;) 
<M,-2e,Y,^Q?''[t]- 

In order to have Ay[t] < —S if > > Qmax for some i,T, we can choose Qmax > ^2e^^ • Thus is a vahd 

Lyapimov function and by Foster's Theorem, our policy is stabilizing for any A < A*. ■ 



VI. Discussion and Conclusions 

We have presented a queueing-based framework for in-network function computation. We have used this framework to gain 
insights into designing dynamic and distributed algorithms for in-network function computation and to quantify the performance 
gains over data-download. We have focused on a class of functions, the FMux functions, which exhibit maximum compression 
on aggregation, and for which we have used the parity and MAX functions as representative examples. For such functions 
we have developed scheduling and routing algorithms under different settings. For wirehne networks, we have extended the 
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random routing scheme of Massoulie et al. |[28| for aggregation. For wireless networks, we have provided a fixed-routing via 
dynamic flow splitting along with MaxWeight-like scheduling, which is shown to be throughput-optimal. 

The wireless algorithm, as presented, requires routing on all aggregation trees in order to achieve throughput optimality; 
this may not be practical in many networks due to the potentially exponential number of trees. However, as we showed in the 
example with the complete graph, one can obtain optimal tree packings with a much smaller number of trees (of the order of 
L) and one direction of future work is to show how such trees can be selected using simple rules in different networks. 

Generalizing these algorithms to deal with a broader class of functions, as well as studying the performance of the algorithms 
with respect to other metrics (delay, energy consumption, among others) are other topics for future work. 
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Appendix A 

Scheduling with Random Packet Forwarding: Detailed Proofs 

We now present the complete proof for the throughput-optimality of Algorithm [T] in directed acyclic graphs. Since the 
proof closely follows the proof of Massoulie et al. | |28| , we do not go into complete details, but try mainly to highlight the 
modifications we make in order to perform aggregation rather than broadcast. 

First we need a lemma that ensures that under the useful packet transmission rule, each round of packets follows a spanning 
tree. Recall that the footprint of a round of packets is defined as the set of nodes in which the packets of that round is present. 
Further, recall that a set S is said to be a valid footprint set if each node in S has a path to a in the subgraph induced by S; 
the collection of such sets is denoted by S. We assume throughout that J\f £ S, for otherwise the min-mincut is 0. Note that 
since we operate in continuous time, only one packet transmission ocurs at a given time with probability 1; further, we require 
that the local state information is available at the time of making routing decision. Now we have the following lemma: 

Lemma 3. For a round of packets with footprint S €z S, the transmission of a useful packet results in a new footprint S' 
which is also a valid footprint set. 

Proof Since the underlying graph is directed acyclic, we re-label the nodes as {0, 1, 2, . . . , — 1} according to their 
topological ordering, where node is the aggregator a, and all edges are from a higher numbered node to a lower numbered 
node. Further, given a round of packets on a valid footprint set S, we have that each node k E S has at least one route to a 
using only nodes in S; for short, we refer to such a route as a path from k to a in S. 

Since we are operating in continuous time, with probability 1 only one packet transmission occurs at a given time. Now 
suppose a useful packet is transmitted on edge {j, i), where i < j, resulting in a new footprint set S' = S \ {j}. For S' to 
be a valid footprint, we need that even after the transmission, each node k E S' has a path to a in S'. To do this, we need to 
consider a partition of the nodes in S' into 3 classes: 

• Node k £ S' such that k < j in the topological order: due to the topological ordering property, a path from fc to a in S" 
is clearly unaffected by the packet transmission from j to i. 

• Node k G S' ,k > j such that there exists a path from fc to a in S* which does not include j: such a path is also unaffected 
by the packet transmission from j to i and hence is still present in S'. 

• Node k E S' , k > j such that all paths from fc to a in 5* pass through j: we show by contradiction that this case is 
impossible under the rules of useful packet transmission. For any path from to a in S, let k' < khe the node immediately 
before j (i.e., the path is fc —>...—> fc' ^ j —>...—> a). Then k' has no path to a in 5 that does not pass through j, 
for otherwise we have a path from k to k', and then to a, which does not pass through j. This means that k' becomes 
isolated upon transmission of packet from j to i, which violates the non-isolation condition of useful packet forwarding. 

Thus we have that S' is a valid footprint set. ■ 
The main idea behind the proof in |28l was to define the 'footprint counter' variables to represent the state of the system, 
and considering an appropriate function of these that allowed translating the local decisions of the nodes in terms of global 
graph parameters. In order to modify the proof for broadcast, we defined a similar collection of counter variables in Section 



IV and now define their associated dynamics as follows. 

• Arrival of new round: Xj^ — > Xj^ + 1 (This corresponds to adding a packet to the queue with footprint J\f, as a packet 
of the new round is simultaneously generated at all the nodes). 

• Completion of packet transfer: This is only for active packets, i.e., those currently under transmission. For active packet 
r E A with corresponding {FPr,Er) and {u,v) E E^, we have: 

FPr ^ FPr \ {u}, Er^Er\ {{u, v)) , 
Er — 4> ^ Xpp^ — Xpp^ + 1. 

(The first equation corresponds to removing the edge over which packet transmission was completed, and also updating 
the footprint of the packet to include the new node. The second updates the list of idle packets in case there is no other 
instance of this packet being transmitted.). 

• Initiation of a new transfer at an idle link. The new packet is selected uniformly at random among the set of useful packets 
at the node. If {u,v) ^ Er^i r E A, then a new packet transfer is formally described as follows: 

- Select a useful packet of an idle round with footprint S E S,v E S,u ^ S, with probability 



PS 



Select a useful packet of an active round r E A with (FP^, Er) E A with probability 

1 

Pr 



X+u-v + X+u-v 
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- If idle packet with footprint S is selected: Xs ^ Xs - 1,A ^ Aii {r}, with [FPr.Er) = [S, If packet of 

active round r is selected, then ^ Ej-U {{u,v)}. 

We note here that the node itself does not need to know these global counters to perform packet selection; rather, this 
emerges from the use of the random useful packet forwarding rule. The idea of relating the local packet selection rule to the 
global counters is crucial in proving the optimality of the algorithm. The local rules for checking whether a packet is useful 
or not corresponds to selecting packets whose global footprint obeys certain properties; picking a useful packet uniformly at 
random therefore corresponds to picking a packet from such a useful global footprint with a probability proportional to the 
corresponding counter variable. 

Observe that in order to determine the flow into a footprint set S, we need to consider the collection of sets which include 
S and have one extra node. We now define the fluid limits of the system. This is similar in spirit to the fluid limit of the 



system in |28|, so we try to use similar notation. The existence of the limit also follows immediately from their convergence 
results, so we omit it due to lack of space and refer interested readers to |28 | for technical details. 

a) The fluid limits of the system:: The fluid trajectories t xs{t),S G S corresponding to the system are defined as 
follows: 

• V (u, u) G £, y Ss.t.v e S*, u ^ S", 3t — ^ 4's+u,(u,v){t) s-t. 

seS:S+u=j\f for some u-'^&s,{u,v)€C 

Xs{t) ^XsiO) + ^ ^ (f>S+u.Xu,v){t) 

S'eS:S'+u=s for some «ueS':(n,D)e£ 

• Work Conservation: At almost every t,(t)sxu.v){t) is differentiable and if x+u~v{t) > (where x+u-v{t) is the fluid 
trajectory associated with then we have 



dt X+u-v{t) 

• 0s,(«,ii)(*) non-decreasing, Lipschitz continuous, with Lipschitz constant Cuv, and YliSes-veS u4S 't'S+u,{u,v){'t) is 
c„„-Lipschitz. 

For any y € M}f^,S{y) = set of all fluid trajectories with initial condition e C([0, oo), Mlj^'), and further, we define 
{X^{t)}s(:s as the state of the MC with initial conditions (X^(0), ^^(0)), Y^{t) = Now, as in [28|, for a 

sequence of initial conditions {X (0) , (0)) , N > s.t. for a sequence of positive numbers {zn)n^o, limN^ao zn — oo 



and the limit 



exists in Mf ', we have that VT > 0, e > 0: 



lim P[ inf sup - m\\ > e] = 0. 

JV-s-oo feSix{0))t(z[o,T] 

b) The fluid Lyapunov function: : Next we define the candidate Lyapunov function that we use to analyze the stability of 
the system. In | [28| , the function was defined in terms of queues (or counters) that counted all the packets whose footprint was 
contained inside a set S. The advantage of these queues for studying broadcast was that their rate of increase was controlled 
by external arrivals to the system, while they were drained due to transfers across the cut defined by the set S. 

For the purpose of studying aggregation, we need to identify an equivalent set of queues to reflect the unique dynamics of 
the system. In particular, we consider for each set S a queue of all rounds whose footprints are not entirely contained within 
S. These queues (counters) exhibit similar properties to the ones considered for broadcast in that every incoming round is 
counted by all these queues (as every node in the network generates a packet), while the drain of these queues is controlled 
by flow across the cut defined by the set S. Formally, we have the following theorem: 

Theorem 5. Let {xs}ses denote the fluid trajectories, y S €z S, deflne: 

S'es,S'<^s 

Then (given A, Cuv) 3/3i, /32, ■ ■ • , Pk-i > 0, e > such that the Lyapunov function 

L{{xs}ses) = max/3|s|a;^s (1) 
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verifies 

L(x(t)) < max(0, L{x{0)) - et). (2) 

As in f28l, before proving this theorem we first we need a combinatorial lemma. This lemma and its proof parallels a 
corresponding lemma in | |28| , with modifications to deal with aggregation and the x^g counter variables we have defined 
above. 



Lemma 4. Let a > be fixed (but arbitrary). We define: 

l3,= (l + -\ ,i^l,...,K. 



Then ^ {xs}ses G , the following conditions hold: 

1) V S* e 5, w G 5, M ^ S*, we have 

X+u-v < (1 + O-y^X^s =^ P\S\ + lX^s+u > P\S\X^s- 

2) y S G S such that \/ v G S,u ^ S, x-^-^-v > (1 + c()^^x^g, if 3v G S,u ^ S and some S' ^ S,v G S", u ^ S' such that 
xs'+u > ax+u~v, then 

P\SUS'\X<^S\JS' > l^\S\x^s- 

Note that Lemma [4] does not depend on the algorithm, or the fluid model in any way. It is a pure combinatorial property of 
the way that the quantities are defined. In other words, any function mapping the sets S G S to E_|_ obeys the lemma for any 
a > 0. Later we use the ability to control a to obtain uniform bounds on the Lyapunov drift. 
Proof: For the first condition, consider S G S,v G S,u ^ S such that 

x+u-v < (1 + ay^x^g. 

Then we have 

Xf^S 



and thus 





-u 


+ XS+u, 


- ^^S4 


-u 


X+xi—v I 


< ^^S4 


-u 


+ {l + a) 




1 


+ a 


^<^s < 







However, from the definition of the /3i, we have that Pi^^^ = A+i for alH = 1, 2, . . . , — 1. Hence we have that 

P\S\ + lX^S+u > P\s\x,^s- 

For the second condition, consider S G S such thatVw G S,u ^ S, x+u-v > (1 + Ci)^^X(^g. Further, consider set S' such 
that S' <^ S,v G S' ,u ^ S' and satisfying 

xs'+u > ax+u-v 

Then we have 

l3\SuS'\Xi^SuS' ^ l3\SuS'\XS'+u, 

> P\SuS'\OiX+u-v), 

> P\sus'\H^ + (^r^x^g. 

Thus for our condition, we need 

and noting the fact that /3i are increasing with i, it is sufficient to ensure 

/3j+i 1 + a w . , „ ^ 
—— > , V2 = 1,2,...,A - 1. 

Pi a 

This in fact holds with equality because of our choice of j3i. Thus, given any a > 0, we can construct Pi such that the two 
conditions hold. ■ 
Now we use Lemma |4] to prove Theorem |5] The steps of this proof closel3/_follow the corresponding proof in p8) . 

Proof: (Proof of Theorem pi Given a > 0, we define Pi as in Lemma 4 Then, or any y G , if S* is a set which 
belongs to arg-max of laaxg^sp^g^x^^g, then xg. > (unless all the fluid sample paths are identically 0). 
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Next we use the optimality of S* to obtain some relations between xs^ and the weight across its cut-edges. V v G S* ,u ^ S* 
such that {u,v) € C, we have from the contrapositive of the first condition of Lemma |4] (as S* is in the arg-max) that 

x+u-v > (1 + ay^x^g,. 

Similarly from condition 2, ^ v € S* ,u ^ S* , S' S* such that v e S',u ^ S\ we have that 



Now we have 



d \ ^ d 



dt 



=A- 

=A - 



dt 



ue5',«^s* scs'-.ves 



E 



E 



dt 



1 - 



E 



J2 C " " 

(From defn of fluid trajectories) 

— ^ ^ ^ ^uv ^" ^ ^ Cut; 



If we choose a and e as follows: 



E « 

«es*,i;^s* ues*,vfs* s'^s* ,ves' ,u^S' 
(From previous observation) 

<A — >^ + max c„i,|£|2^a. 

(u.u)e£ 

max(„ ,„-)g£ Cut, Z 



then we get that, for all 5** e argmaxggs (3\s\x,^s^ 

d 

To argue that this implies negative drift of the Lyapunov function, i.e. L{x{t)) = maxggs fi\s\X(^g < max(0, L(x(0)) — et), 
we observe that by definition /3|5| > IVS G S. Finally, using the Lipschitz continuity of the trajectories, it is sufficient to 
show this property holds for the sets S* G argmax^g^ I3\s\x<^s- * 
Finally we can prove Theorem [3] using the stability of the fluid limit process along with standard techniques from literature; 
for technical details, see 128). 



